-
Notifications
You must be signed in to change notification settings - Fork 26
Atlas search lookups #325
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Atlas search lookups #325
Conversation
449b6a3
to
ca8a7cf
Compare
9935b25
to
a467a57
Compare
ea2118b
to
206b554
Compare
456028d
to
65f22e6
Compare
eb6eb07
to
e7f4d22
Compare
0fdb066
to
eed2499
Compare
eed2499
to
99f6548
Compare
docs/source/ref/models/search.rst
Outdated
``SearchEquals`` objects can be reused and combined with other search | ||
expressions. | ||
|
||
See :ref:`search-operations-combinable` |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I wonder if we could structure things so we don't need to repeat this boilerplate on every(?) expression.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
🤔 I think we cannot scape, unless we list the operations that could be combined in the section of combinable operations. I like to have this link meanwhile I am reading the docs, so it gives an introduction of some (cool?) behaviour
tests/queries_/test_search.py
Outdated
delayedAssertCountEqual = _delayed_assertion(timeout=2)(TransactionTestCase.assertCountEqual) | ||
delayedAssertListEqual = _delayed_assertion(timeout=2)(TransactionTestCase.assertListEqual) | ||
delayedAssertQuerySetEqual = _delayed_assertion(timeout=2)( | ||
TransactionTestCase.assertQuerySetEqual | ||
) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Are the non-delayed versions ever used? Maybe it's better to overwrite the original names so we don't have to write "delayedXXXXX" everywhere. Or maybe the waiting could be done in setUp()
after data is inserted? Unless some test inserts more data, essentially only the first test's waiting is needed, right?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
No, I all the checks are delayed...
Regarding to the second question: right, any test that insert data need to wait. If the data is inserted in the init class, we could only wait once. So If we want to get rid of those delayed, we can wait in the creation part.
tests/queries_/test_search.py
Outdated
|
||
|
||
@skipUnlessDBFeature("supports_atlas_search") | ||
class SearchEqualsTest(SearchUtilsMixin): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I've tried to be consistent in this project about using "Tests" (plural) in the class names.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
🤔 mmh I didn't notice that. will change.
tests/queries_/test_search.py
Outdated
boost_score = SearchScoreOption({"boost": {"value": 3}}) | ||
|
||
qs = Article.objects.annotate( | ||
score=SearchEquals(path="headline", value="cross", score=boost_score) | ||
) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'd inline boost_score
, or at least omit the blank line. (Only some tests are inconsistent.)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Things look great, but I've gone through about half of the code (due to size). I will check the test code tomorrow!
django_mongodb_backend/compiler.py
Outdated
if not has_search: | ||
raise ValueError( | ||
"Cannot combine two `$vectorSearch` operator. " | ||
"If you need to combine them, consider restructuring your query logic or " | ||
"running them as separate queries." | ||
) | ||
raise ValueError( | ||
"Only one $search operation is allowed per query. " | ||
f"Received {len(search_replacements)} search expressions. " | ||
"To combine multiple search expressions, use either a CompoundExpression for " | ||
"fine-grained control or CombinedSearchExpression for simple logical combinations." | ||
) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think these two ValueErrors
need to be switched.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
🤔 the second is the case when:
has_vector_search but it does not has search. I think I should refactor this. It is a bit confusing. the not at the beginning is not helping.
# Apply De Morgan's Laws. | ||
operator = node.operator.negate() if negated else node.operator | ||
negated = negated != (node.operator == Operator.NOT) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This logic is a little confusing because it requires some understanding of negate
and the state changes.
I'll leave this as a comment here to be reviewed later.
What's an example of a NOT combinable?
I.e., how would I construct NOT (A AND B)
or can this only be done via negate
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I applied De Morgan's Law to get something in the scope of A' operator B'. So:
NOT (A AND B) = Not A or Not B => {SHOULD: [MUST_NOT(A), MUST_NOT(B)]
with minimum should in 1.
The other way to handle this is push everything in a must not, but in order to handle: NOT (NOT A))
as A I decided to apply this kind of simplifications.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
And, I almost forgot, we could handle double negation on should. (That are the ors if the minimumShouldMatch is 1 ). long story short
A and B => MUST
not C => MUST_NOT
A or B => SHOULD with minimumShouldMatch is 1
not (A or B) => not A and not B => MUST(MUST_NOT(A), MUST_NOT(B))
When A, B, C are atomic.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I refactored it a bit. It is simpler now, don't know if it is simple enough 😄
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Overall PR looks great! I've got some minor corrections, but other than that, it is good to merge from me. Great work! 🚀
It also looks like there's a ReadTheDocs error:
/home/docs/checkouts/readthedocs.org/user_builds/django-mongodb-backend/checkouts/325/docs/source/ref/models/search.rst:654: WARNING: unknown document: 'atlas:atlas-search/scoring/' [ref.doc]
docs/source/releases/5.2.x.rst
Outdated
@@ -16,6 +16,12 @@ New features | |||
- Added :class:`~.fields.PolymorphicEmbeddedModelField` and | |||
:class:`~.fields.PolymorphicEmbeddedModelArrayField` for storing a model | |||
instance or list of model instances that may be of more than one model class. | |||
- Added support for MongoDB Atlas Search expressions, including | |||
``SearchAutocomplete``, :class:`.SearchEquals`, ``SearchVector``, and others. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
``SearchAutocomplete``, :class:`.SearchEquals`, ``SearchVector``, and others. | |
``SearchAutocomplete``, :class:`SearchEquals`, ``SearchVector``, and others. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This suggestion isn't correct. Without the leading dot, the class won't be resolved properly.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
🤔 maybe I forgot to add the prefix ~.expressions
. But I don't know much about docs.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It works as is. The dot allows the path to be looked up rather than resolved as an exact match.
tests/queries_/test_search.py
Outdated
def create_search_index(cls, model, index_name, definition, type="search"): | ||
collection = cls._get_collection(model) | ||
idx = SearchIndexModel(definition=definition, name=index_name, type=type) | ||
collection.create_search_index(idx) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
NIT: For the sake of testing, we can make this a blocking call and check for the index before continuing.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
🤔 If I don't understand wrong, I have to add a wait for predicate. Right?
This PR adds the initial implementation of the Atlas operator.
Task: